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ABSTRACT 

The University of California Santa Cruz (UCSC) 
Genome Browser (http://genome.ucsc.edu) offers 
online public access to a growing database of 
genomic sequence and annotations for a wide 
variety of organisms. The Browser is an integrated 
tool set for visualizing, comparing, analysing and 
sharing both publicly available and user-generated 
genomic datasets. As of September 2012, genomic 
sequence and a basic set of annotation 'tracks' are 
provided for 63 organisms, including 26 mammals, 
13 non-mammal vertebrates, 3 invertebrate 
deuterostomes, 13 insects, 6 worms, yeast and sea 
hare. In the past year 19 new genome assemblies 
have been added, and we anticipate releasing 
another 28 in early 2013. Further, a large number of 
annotation tracks have been either added, updated 
by contributors or remapped to the latest human ref- 
erence genome. Among these are an updated UCSC 
Genes track for human and mouse assemblies. 
We have also introduced several features to 
improve usability, including new navigation menus. 
This article provides an update to the UCSC Genome 
Browser database, which has been previously 
featured in the Database issue of this journal. 



INTRODUCTION 

The University of California Santa Cruz (UCSC) Genome 
Browser (1,2) at http://genome.ucsc.edu is a web-based set 
of tools providing access to a database of genome sequence 
and annotations for visualization, comparison and analysis 
by the scientific, medical and academic communities. Our 
primary mission is to provide timely and convenient open 
access to high-quality human genome sequence and anno- 
tations in a framework that enables easy exploration from 
genome-wide down to the base level. Annotation datasets, 
or 'tracks', on the human genome cover conservation and 
evolutionary comparisons, gene models, regulation, expres- 
sion, epigenetics and tissue differentiation, variation, 
phenotype and disease associations. Our mission extends 
to a number of additional organisms including 6 other 
primates, 19 additional mammals including 3 marsupials 
and 1 monotreme, 13 non-mammalian vertebrates and 
24 invertebrates, each with varying degrees of genome- 
specific annotation. Many of the genomes in our database 
have multiple assembly versions, which support researchers 
who use annotations mapped using older assemblies. 

LOCAL DATASETS 

The Genome Browser locally hosts mapping and sequence 
annotation tracks that describe assembly, gap and GC 
content for all organisms in the browser database. 
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Additionally, for most organisms we show alignments 
from RefSeq genes (3), mRNAs and ESTs from 
GenBank (4), and other gene or gene prediction tracks 
such as Ensembl Genes (5). For human and mouse 
assemblies, we also offer a locally generated UCSC 
Genes track based upon RefSeq, GenBank, CCDS and 
UniProt data (6,7). About half of the genomes hosted at 
UCSC include a multiple sequence alignment (multiz) 
track (8) and pairwise genomic alignments between 
assemblies to facilitate comparative and evolutionary in- 
vestigations. Expression, regulation, variation and pheno- 
type tracks are available for many of the assemblies. Most 
locally hosted tracks include descriptions with references 
and links to the original contributors or research upon 
which the annotations are based. 

New genome assemblies 

With the abundance of new vertebrate assemblies avail- 
able in GenBank, the UCSC Genome Browser team has 
streamlined its browser release pipeline in the effort to 
keep pace. We have added 19 new assemblies to the 
Genome Browser in the past year, including 4 model or- 
ganisms (Fugu, mouse, worm and yeast), 7 newly 
sequenced organisms (gibbon, lesser hedgehog tenrec, 
medium ground finch, naked mole-rat, tasmanian devil, 
turkey and western painted turtle) and 8 updated 
assemblies for previously published organisms (chicken, 



cow, dog, gorilla, microbat, rat, tammar wallaby and 
western clawed frog) — see Table 1 for details. We antici- 
pate the public release of 28 more genome assemblies 
in the coming months (Table 2) in support of the new 
mouse (GRCm38/mmlO) 60-way conservation track. For 
a complete list of the genome assemblies included in this 
track, refer to the mm 10 Conservation track description 
page on the Genome Browser website. 

New and updated annotations 

Many new datasets were added to the Genome Browser 
this year, and several existing datasets underwent major 
revisions. A significant portion of these were contributed 
by the Encyclopedia of DNA Elements (ENCODE) 
Consortium: we released tracks and downloadable files 
for more than 2300 experiments as the Data 
Coordination Center for the ENCODE Project (9,10), 
described in a companion paper in this issue. 

We published a major update of the UCSC Genes track 
(6) for the human assembly (GRCh37/hgl9) that includes 
more non-coding transcripts based on data from Rfam and 
from the tRNA Genes track. We anticipate releasing an 
updated UCSC Genes for mmlO in fall of 2012. Rat 
Genome Database (RGD) Genes for rat has replaced 
UCSC Genes as the main gene track for Baylor 3.4/rn4 (1 1). 

We have updated dbSNP for hgl9 to version 135, which 
includes interim phase 1 variant calls from the 1000 



Table 1. Assemblies released on the Genome Browser in 2012 



Common Scientific UCSC Sequencing Sequencing Notes 

name name ID center center ID 



Chicken 
Cow 
Dog 
Fugu 



Gibbon 
Gorilla 



Gal/us gallus 
Bos Taurus 
Canis familiaris 
Takifugu rubripes 



Nomascus leucogenys 
Gorilla gorilla gorilla 



Lesser hedgehog tenrec Echinops telfairi 
Medium ground finch Geospiza fortis 



Microbat 
Mouse 



Naked mole-rat 
Rat 

Tammar wallaby 
Tasmanian devil 



Myotis lucifugus 
Mus musculus 



Heterocephalus glaber 
Rattus 

Macropus eugenii 
Sarcoph ilus harrisii 



Turkey 

Western clawed frog 



Me/eagris gallopavo 
Xenopus ( Silurana ) 
tropicalis 

Western painted turtle Chrysemys picta beltti 



Worm 



Yeast 



Caenorhabditis elegans 



Saccharomyces 
cerevisiae 



galGaW 
bosTau7 
canFam3 
fr3 



nomLeu 1 
gorGor3 

echTel 1 
geoForl 

myoLuc2 
mm 10 



hetGlal 
rn4 

macEug2 
sarHarl 

melGall 
xenTro3 

chrPicl 

celO 

sacCer3 



Int'l Chicken GSC 
Cattle GSC 
Dog GSC 
Int'l Fugu GSC 



Gallus_gallus-4.0 

Btau_4.6.1 

V3.1 

FUGU5 



Gibbon GSC Nleul.O 

Wellcome Trust Sanger gorGor3.1 
Institute 

Broad Institute EchTel 1 

Genome 10K Project GeoFor_1.0 

and BGI 

Broad Institute Myoluc2.0 

Mouse GRC GRCm38 



BGI HetGlal.O 

Baylor Human GSC RGSC_v3.4 

Tammar Wallaby GSC MeugJ.l 

Wellcome Trust Sanger Devil_refv7.0 
Institute 

Turkey GSC Turkey_2.01 

US DOE JGI-PGF V4.2 

Int'l Painted Turtle Chrysemys_picta_bellii- 



GSC 
WormBase 



Saccharomyces 
Genome Database 
(SGD) 



3.0.1 

WS220 



SacCer_Apr2011 



RefSeq Genes, 
8-species mult, 
alignment 



RefSeq Genes, 
60-species mult, 
alignment 



RefSeq Genes, 

7-species mult. 

alignment 
Ensembl Genes, 

7-species mult. 

alignment 
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Genomes project (12). This new version contains add- 
itional annotation data not included in previous dbSNP 
tracks, with corresponding coloring and filtering options 
in the Genome Browser. We anticipate having dbSNP 
version 137 for hgl9 available in fall 2012, with 
Sequence Ontology (13) terms replacing dbSNP's func- 
tional annotation terms in the display. 

To ensure timely display of data from frequently 
updated phenotype and disease association databases we 
have automated loading of the following hgl9 tracks: 
Catalogue Of Somatic Mutations In Cancer (COSMIC), 



GeneReviews, GWAS Catalog and Online Mendelian 
Inheritance in Man (OMIM) (14-17). 

We have added a Publications track that shows DNA 
and protein sequences, SNPs, cytogenetic bands and gene 
symbols which were text-mined from 3 million biomedical 
articles in Elsevier, PubMed Central and other databases 
(18). This track is based on the UCSC Genocoding Project, 
which searches for references to chromosomal locations in 
scientific articles. The annotations in this track link back to 
the original article, thus allowing researchers to identify 
publications relevant to a particular locus (Figure 1). 



Table 2. Assemblies to be released on the Genome Browser by early 2013 



Common name 


Scientific name 


UCSC ID 


Sequencing center 


Sequencing center ID 


Alpaca 


Vicugna pacos 


vicPacl 


Broad Institute 


VicPacl.O 


Armadillo 


Dasypus novemcinctus 


dasNov3 


Baylor College of Medicine (BCM) 


Dasnov3.0 


Atlantic cod 


Gadus morhua 


gadMorl 


Genofisk 


GadMor May2010 


Baboon 


Papio hamadryas 


papHaml 


BCM 


PhamJ.O 


Budgerigar 


Melopsittacus undulates 


melUndl 


Washington University at St. Louis 


Melopsittacus undulatus 6.3 


Bushbaby 


Otolemur garnettii 


otoGar3 


Broad Institute 


OtoGar3 


Cat 


Felis catus 


felCat5 


International Cat GSC 


Felis catus 6.2 


Chimpanzee 


Pan troglodytes 


panTro4 


Chimpanzee SAC 


CSAC 2.1.4 


Chinese rhesus 


Macaca mulatto 


rheMac3 


BGI 


CR 1.0 


Coelacanth 


Latimeria chalumnae 


latChal 


Broad Institute 


LatChal 


Dolphin 


Tursiops truncates 


turTru2 


BCM 


Ttru 1.4 


Gibbon 


Nomascus leucogenys 


nomLeu2 


Gibbon GSC 


Nleul.l 


Hedgehog 


Erinaceus europaeus 


eriEurl 


Broad Institute 


EriEurl 


Kangaroo rat 


Dipodomys ordii 


dipOrdl 


Broad Institute 


DipOrdl.O 


Manatee 


Trichechus manatus latirostris 


triManl 


Broad Institute 


TriManLatl.O 


Megabat 


Pteropus vampyrus 


pteVaml 


Broad Institute 


PteVapl.O 


Mouse lemur 


Microcebus murinus 


micMurl 


Broad Institute 


MicMurl. 0 


Naked mole-rat 


Heterocephalus glaber 


hetGla2 


Broad Institute 


HetGla female 1.0 


Nile tilapia 


Oreochromis niloticus 


oreNill 


Broad Institute 


Orenill.O 


Pig 


Sus scrofa 


susScr3 


International Swine GSC 


Sscrofal0.2 


Pika 


Ochotona princeps 


ochPri2 


Broad Institute 


OchPri2.0 


Rock hyrax 


Procavia capensis 


proCapl 


Broad Institute 


ProCapl.O 


Shrew 


Sorex araneus 


sorAral 


Broad Institute 


Sor Ara 1 


Sloth 


Choloepus hoffmanni 


choHofl 


Broad Institute 


ChoHofl .0 


Squirrel 


Spermophilus tridecernlineatus 


speTri2 


Broad Institute 


SpeTri2.0 


Squirrel monkey 


Saimiri boliviensis 


saiBoll 


Broad Institute 


SaiBoll. 0 


Tarsier 


Tarsius syrichta 


tarSyrl 


Broad Institute 


TarSyrl. 0 


Tree shrew 


Tupaia belangeri 


tupBell 


Broad Institute 


TupBell 




Human Feb. £009 (GRCh37/hSl9> Chrl : 159, 174,437-159, 175,227 (791 bp) 

200 Bases}— — 1 ngi9 

159, 174,7901 159, L74,3ee| 159, L 74,96 6 1 159, 175, 99B| 



159, 175, i ee| 159 j 175, seel 



Ucsc Genes (Re-fseci, 



: & comparative Genorn-ii 



■l imp 1 s Hue leot i as Fo 1 yiviorijh isms dloSNF 1 35 'j Found in ,■- IT. of "ariitj ]ss 
rs2314773 | r-s75E9207 | 

rsS63eea | 

rs72549397 | 

Sequences in fir-ticlesi FuftmedCentra 1 and Elsevier 



I kemotoi996 | 



Reiner-2995 | 
caetano2096 | 



I iOWSKal995 



Determination of Duffy genotypes in three populations of 
Fh mferao nj African descent using PCR and sequence-specific 

oligonucleotides 



Cotor-HJ 
Laurenoer 

HOPUDCKiiltimil 



Kasehasen2ei07 
Dhorda23ll H 
Nofluthor2003 | 



SNPs in Publications 



Figure 1. Genome Browser image of the promoter region of DARC on human assembly hgl9 including UCSC Genes, dbSNP 135 and the 
Publications track showing sequences and SNPs text-mined from PubMed Central and Elsevier. The region shown includes a SNP responsible 
for the Duffy blood group (rs28 14778). The publication track contains sequences in this region from several articles relevant to this SNP. Note that 
hovering the mouse over a sequence shows the title of the corresponding article. Clicking on a sequence in the publications track takes the user to a 
page with details about the relevant article. 
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We have added four public track hubs for hgl9 from 
external data providers (see below for more details on 
track hubs): the ENCODE Analysis hub contains descrip- 
tions of ENCODE data in uniformly processed signal and 
element representations, as well as genome segmentations 
(19); the UMassMed ZHub contains H3K4me3 ChlP-seq 
data for autistic brains (20); the Expression & PolyA 
Database (xPAD) hub contains a map of polyadenylation 
sites in cancer tissues and tumor cell lines (21); the 
miRcode hub contains predicted microRNA target sites 
in GENCODE transcripts (22). 



SOFTWARE IMPROVEMENTS 

We made several changes to the interface of the Genome 
Browser in 2012 based on suggestions from our users. All 
pages now display a menu bar to make it easier to access 
features and navigate around the website in a consistent 
way. We have changed the fonts and background to 
improve usability. The annotation search and gene 
suggest box have been combined, and we have added 
descriptions to the gene suggestion list. We have 
changed the way users log in when saving sessions; this 
change simplifies the login procedure and also removes the 
dependency on Media Wiki, which makes it easier for 
Genome Browser mirrors to support saved sessions. 

We introduced support for the Variant Call Format 
(VCF) in 201 1 (23). This year we improved VCF support 
with a haplotype sorting display. VCF can optionally rep- 
resent phased genotypes, i.e. the two alleles of each diploid 
genotype have been assigned to two haplotypes, one in- 
herited from each parent. For VCF files that contain 
phased genotypes from multiple samples, we have developed 
an advanced display to highlight local patterns of genetic 
linkage between variants. The display features the clustering 



of independent haplotypes within the viewed region. The 
goal of the clustering is to visually group co-occurring 
allele sequences in haplotypes, so local patterns of linkage 
can be easily discerned. The clustering does not indicate 
relatedness of individuals, but merely local composition of 
mostly ancient haplotype blocks. We anticipate adding 1000 
Genomes Phase 1 variant calls with phased genotypes for 
1092 individuals using this display in fall 2012. 

In the haplotype sorting display (Figure 2), independent 
haplotypes are shown horizontally, and variants are 
vertical bars with reference alleles in white (invisible) 
and alternate alleles in black. A variant for which most 
haplotypes have the reference allele will be mostly white 
(invisible); tick marks at the top and bottom of each 
variant make such variants easier to see. Haplotypes are 
clustered by similarity weighted by proximity to a central 
variant, which is outlined in purple. In order to limit 
compute time, only a small number of variants are used 
for clustering; these variants have purple tick marks above 
and below. The clustering tree is drawn in the left label 
area, and is used to order the haplotypes from top to 
bottom. When a rightmost branch in the clustering tree 
is purple, it means that all haplotypes in the branch are 
identical, at least in the variants used for clustering. 

In 2011 we introduced support for track data hubs, 
which are web-accessible directories of genomic data 
that can be viewed in the UCSC Genome Browser along- 
side the annotation tracks hosted by UCSC (2). This tech- 
nology has many advantages: it allows researchers to 
combine and configure large numbers of datasets for pres- 
entation as single entity, it improves performance by 
allowing the Genome Browser to retrieve data only 
when necessary, and it allows researchers to share a col- 
lection of data with colleagues as a private data hub. 
Track hubs usage increased greatly in 2012; by 



Window Position 
Scale 
chr5 : | 



Human Feb. 2669 (GRCn37/hgi9 J 
i kb| — 



cnr-5 : 131, 325,461 




131,826,5681 131,827, 000| 

UCSC Genes CRefSeq, UniPi-ot, CCDS, F:f am , tRNfis & Comparative Genomics) 



828, 866 (2,6BB bp) 

1 hgl9 

131,827,5901 



131,828, 009| 



Genomes Phase 1 Integrated Variant Calls: SNVs, InDels, SVs 



rs2549005:T/T:156TyC:479C/C:457 



H3K27RC Mark (Often Found Hear Fictive Regulators Elements) on 7 cell lines from ENCODE 



Mammal cons 




Figure 2. Genome Browser image of the promoter region and transcription start of IRF1 on human assembly hgl9 showing UCSC Genes, 1000 
Genomes Phase 1 Integrated Variant Calls in the haplotype sorting VCF display mode, histone mark H3K27Ac binding in overlays of 7 ENCODE 
cell lines and PhyloP conservation scores from alignments of placental mammals. Mouse-over text gives the dbSNP identifier and genotype counts for 
one of the 1000 Genomes variants. The variant outlined in purple is used as the center variant for clustering haplotypes by similarity, and is clearly in 
linkage with nearby variants. Wider purple triangular leaves of the clustering tree indicate more common local haplotypes. Note that the reference 
genome haplotype (horizontal run of invisible reference alleles) is often not the major haplotype among the 1000 Genomes Phase 1 samples. 
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September 2012 more than 2000 track hubs were in use. 
There is also a growing trend in the research community 
to use track hubs to collect and organize data for 
presentation in publications. UCSC has extended the 
documentation (http://genome.ucsc.edu/goldenPath/help/ 
trackDb/trackDbDoc.html) for track hubs on the 
Genome Browser website to facilitate their use. 
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FUTURE DIRECTIONS 

We will continue to add new and updated genome 
assemblies for vertebrate and other selected model organ- 
isms as they become available. Only assemblies registered 
and deposited in NCBFs GenBank will be considered for 
hosting at UCSC, as stipulated in the Browser Genome 
Release Agreement instituted by NCBI, Ensembl and 
UCSC. Many researchers have expressed interest in 
using the Genome Browser to visualize and analyse 
assemblies that are not deposited at NCBI. To assist 
such research, we intend to develop support for 
assembly data hubs, which will enable the genomics com- 
munity to easily extend the Genome Browser to display 
genome assemblies that we are unable to integrate into our 
own database. The assembly data hub will be similar in 
concept to the track data hub: the data provider will store 
the genome sequence in a compressed, binary, indexed file 
format and make it available on a remote web server along 
with a list of tracks that annotate that genome. 

We plan to add or update several annotation tracks in the 
upcoming year, including a coverage/mapability track based 
on 1000 Genomes project data, an updated recombination 
rate and UCSC Genes track for the human genome, an 
updated ORFeome track for zebrafish, a mouse strain 
variant track, segmental duplication tracks for several 
assemblies, and more selected personal genomes in the 
human Personal Genome Variants track. We will also 
continue to incorporate selected datasets from the 
ENCODE project that are of general interest to our users. 

We are developing a tool for integrating diverse anno- 
tations in our databases with user-provided genomic 
variants, to assist with analysis and prioritization of 
variants discovered via sequencing. We will finish 
support for VCF in tracks hubs. We also plan to imple- 
ment a supported mirror in Germany to improve access 
speed for European users of the Genome Browser. 



CONTACTING US 

We have two public, moderated mailing lists for user 
support: genome@soe.ucsc.edu for general questions 
about the Genome Browser and genome-mirror@ 
soe.ucsc.edu for questions specific to the setup and main- 
tenance of Genome Browser mirrors. Archives of both 
lists are searchable from our contacts page at http:// 
genome.ucsc.edu/contacts.html. You may also reach us 
at genome-www@soe.ucsc.edu, the preferred address for 
inquiring about mirror site licenses and reporting server 
errors. 
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